Big Data Benchmark Compendium
نویسندگان
چکیده
The field of Big Data and related technologies is rapidly evolving. Consequently, many benchmarks are emerging, driven by academia and industry alike. As these benchmarks are emphasizing different aspects of Big Data and, in many cases, covering different technical platforms and uses cases, it is extremely difficult to keep up with the pace of benchmark creation. Also with the combinations of large volumes of data, heterogeneous data formats and the changing processing velocity, it becomes complex to specify an architecture which best suits all application requirements. This makes the investigation and standardization of such systems very difficult. Therefore, the traditional way of specifying a standardized benchmark with pre-defined workloads, which have been in use for years in the transaction and analytical processing systems, is not trivial to employ for Big Data systems. This document provides a summary of existing benchmarks and those that are in development, gives a side-by-side comparison of their characteristics and discusses their pros and cons. The goal is to understand the current state in Big Data benchmarking and guide practitioners in their approaches and use cases.
منابع مشابه
EVALUATING EFFICIENCY OF BIG-BANG BIG-CRUNCH ALGORITHM IN BENCHMARK ENGINEERING OPTIMIZATION PROBLEMS
Engineering optimization needs easy-to-use and efficient optimization tools that can be employed for practical purposes. In this context, stochastic search techniques have good reputation and wide acceptability as being powerful tools for solving complex engineering optimization problems. However, increased complexity of some metaheuristic algorithms sometimes makes it difficult for engineers t...
متن کاملA Computational Reproducibility Benchmark
Creating and testing reproducible computational experiments is hard. Researchers must derive a compendium that encapsulates all the components needed to reproduce a result. Reviewers must unpack the encapsulated components, run them in an environment that could be different from the source environment, and verify the results. Although many tools support some aspect of reproducibility, there is ...
متن کاملFeature Selection in Structural Health Monitoring Big Data Using a Meta-Heuristic Optimization Algorithm
This paper focuses on the processing of structural health monitoring (SHM) big data. Extracted features of a structure are reduced using an optimization algorithm to find a minimal subset of salient features by removing noisy, irrelevant and redundant data. The PSO-Harmony algorithm is introduced for feature selection to enhance the capability of the proposed method for processing the measure...
متن کاملSetting the Direction for Big Data Benchmark Standards
We provide a summary of the outcomes from the Workshop on Big Data Benchmarking (WBDB2012) held on May 8-9, 2012 in San Jose, CA. The workshop discussed a number of issues related to big data benchmarking definitions and benchmark processes. The workshop was attended by 60 invitees representing 45 different organizations covering industry and academia. Attendees were chosen based on their exper...
متن کاملDeep Data Anaylizing Application Based on Scale Space Theory in Big Data Environment
This paper introduces the basic scientific idea of the multi-scale to the field of big data analyzes, proposes a multi-scale framework of data analyzes in big data environment, present the multi-scale algorithm framework of knowledge conversion theory and apply the algorithm framework to the multi dimension association rules analysis. The proposed multi-scale association rule analysis algorithm...
متن کامل